Search CORE

745 research outputs found

How to deal with heterogeneous data?

Author: Roche Mathieu
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

International audienceThe Big Data issue is traditionally characterized in terms of 3 V, i.e. volume, variety, and velocity. This paper focuses on the variety criterion, which I s a challenging issue

HAL Descartes

Agritrop

HAL-CIRAD

How to exploit paralinguistic features to identify acronyms in texts?

Author: Roche Mathieu
Publication venue: HAL CCSD
Publication date: 01/01/2014
Field of study

International audienceThis paper addresses the issue of acronym dictionary building. The first step of the process identifies acronym/definition candidates, the second one selects candidates based on a letter alignment method. This approach has two advantages because it enables (1) to annotate documents, (2) to build specific dictionaries. More precisely, this paper discusses the use of a specific linguistic concept, the gloss, in order to identify candidates. The proposed method based on paralinguistic markers is independent of languages

HAL Descartes

Agritrop

HAL-CIRAD

La fouille de textes au service de la documentation

Author: Fortuno Sophie
Roche Mathieu
Publication venue: 'Babes-Bolyai University'
Publication date: 01/01/2014
Field of study

Article de vulgarisation scientifiqueNational audienceLes masses de données textuelles aujourd'hui disponibles engendrent un problème spécifique lié à leur traitement automatique. Des méthodes de fouille de textes et de traitement automatique du langage peuvent en partie répondre à cette difficulté. Approche des procédés et des nouveaux défis à relever présentés par deux chercheurs du Cirad, centre de recherche français qui répond, avec les pays du Sud, aux enjeux internationaux de l'agriculture et du développement

HAL Descartes

HAL-CIRAD

Traitement automatique des données hétérogènes liées à l'aménagement des territoires

Author: Roche Mathieu
Teisseire Maguelonne
Publication venue: HAL CCSD
Publication date: 01/01/2015
Field of study

National audienceLa notion d'aménagement du territoire fait référence à différents concepts tels que les informations spatiales et temporelles, les acteurs, les opinions, l'histoire, la politique, etc. Aujourd'hui, avec le développement des technologies numériques (blogs, forums, réseaux sociaux, etc.), l'ensemble des acteurs impliqués s'expriment et tous les documents textuels ainsi produits constituent une source considérable d'informations qu'il est crucial d'analyser. Dans cet article, nous souhaitons poser les premières bases d'une méthode automatique d'extraction de connaissances permettant d'analyser le ressenti (opinion et/ou sentiment) des acteurs impliqués à partir d'un corpus de données totalement hétérogènes constitués spécifiquement pour un territoire. Une telle approche, qui se situe dans le domaine de la science des données, offrira aux décideurs et aux usagers d'un territoire un environnement leur permettant d'en obtenir les clefs de lecture et d'en mesurer tous les enjeux et les contours

HAL Descartes

Agritrop

HAL-CIRAD

Exploiting textual source information for epidemiosurveillance

Author: Arsevska Elena
Dufour Barbara
Hendrikx Pascal
Lancelot Renaud
Roche Mathieu
Publication venue: Springer International Publishing
Publication date: 01/01/2014
Field of study

In recent years as a complement to the traditional surveillance reporting systems there is a great interest in developing methodologies for early detection of potential health threats from unstructured text present on the Internet. In this context, we examined the relevance of the combination of expert knowledge and automatic term extraction in the creation of appropriate Internet search queries for the acquisition of disease outbreak news. We propose a measure that is the number of relevant disease outbreak news detected in function of the terms automatically extracted from a set of example Google and PubMED corpora. Due to the recent emergence we have used the African swine fever as a disease example. (Résumé d'auteur

HAL - Université de Franche-Comté

HAL Descartes

Agritrop

HAL-CIRAD